102 research outputs found

    Information theoretic syllable structure and its relation to the c-center effect

    Get PDF
    Established phonological theories postulate uniform syllable constituent structures. From a traditional hierarchical point of view, syllables are right branching implying a close connection between the nucleus and the coda. Articulatory Phonology in contrast suggests a stronger cohesion between onsets and nuclei than between nuclei and codas. This claim is empirically supported by the c-center effect which initially has been observed for onsets only. Nevertheless, recent studies revealed that this effect does not occur in all complex onsets and can also be observed in codas. To account for this structure non-uniformity, we propose an information theoretic approach to measure connection strengths between syllable constituents in terms of their pointwise mutual information. It turned out that the derived constituent structures correspond well to the empirical c-center findings on American English and German data. The results are discussed from a Usage-based Phonology perspective considering c-centers to be a frequency effect

    Data-driven Extraction of Intonation Contour Classes

    Get PDF
    In this paper we introduce the first steps towards a new datadriven method for extraction of intonation events that does not require any prerequisite prosodic labelling. Provided with data segmented on the syllable constituent level it derives local and global contour classes by stylisation and subsequent clustering of the stylisation parameter vectors. Local contour classes correspond to pitch movements connected to one or several syllables and determine the local f0 shape. Global classes are connected to intonation phrases and determine the f0 register. Local classes initially are derived for syllabic segments, which are then concatenated incrementally by means of statistical language modelling of co-occurrence patterns. Due to its generality the method is in principal language independent and potentially capable to deal also with other aspects of prosody than intonation. 1

    Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

    Get PDF
    We present a Markov part-of-speech tagger for which the P (w|t) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of representations of w. As word representations, string su#xes of w are cut o# at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suffixes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are significantly outperformed by our model

    Automatisation of intonation modelling and its linguistic anchoring

    Get PDF
    This paper presents a fully machine-driven approach for intonation description and its linguistic interpretation. For this purpose,a new intonation model for bottom-up F0 contour analysis and synthesis is introduced, the CoPaSul model which is designed in the tradition of parametric, contour-based, and superpositional approaches. Intonation is represented by a superposition of global and local contour classes that are derived from F0 parameterisation. These classes were linguistically anchored with respect to information status by aligning them with a text which had been coarsely analysed for this purpose by means of NLP techniques. To test the adequacy of this data-driven interpretation a perception experiment was carried out, which confirmed 80% of the findings

    Removing micromelody from fundamental frequency contours

    Get PDF
    In this paper we describe a new method to diminish microprosodic components of fundamental frequency contours by applying weight functions linked to microprosodically classified phone combinations. For vowel segments in obstruent environments our algorithm outperforms standard smoothing algorithms like Moving-Average filtering, Savitzky-Golay filtering or MOMEL in diminishing F0 variations related to microprosodic factors while retaining significant differences related to macroprosody

    Automated Morphological Segmentation and Evaluation

    Get PDF
    In this paper we introduce (i) a new method for morphological segmentation of part of speech labelled German words and (ii) some measures related to the MDL principle for evaluation of morphological segmentations. The segmentation algorithm is capable to discover hierarchical structure and to retrieve new morphemes. It achieved 75 % recall and 99 % precision. Regarding MDL based evaluation, a linear combination of vocabulary size and size of reduced deterministic finite state automata matching exactly the segmentation output turned out to be an appropriate measure to rank segmentation models according to their quality

    Comparing human and machine vowel classification

    Get PDF
    In this study we compare human ability to identify vowels with a machine learning approach. A perception experiment for 14 Hungarian vowels in isolation and embedded in a carrier word was accomplished, and a C4.5 decision tree was trained on the same material. A comparison between the identification results of the subjects and the classifier showed that in three of four conditions (isolated vowel quantity and identity, embedded vowel identity) the performance of the classifier was superior and in one condition (embedded vowel quantity) equal to the subjects’ performance. This outcome can be explained by perceptual limits of the subjects and by stimulus properties. The classifier’s performance was significantly weakened by replacing the continuous spectral information by binary 3-Bark thresholds as proposed in phonetic literature [8]. Parts of the resulting decision trees can be interpreted phonetically, which could qualify this classifier as a tool for phonetic research

    Quantity distinction in the Hungarian vowel system - just theory or also reality?

    Get PDF
    According to most current theories, the Hungarian vowel system involves 14 vowels that correspond to seven vowel pairs, each differentiated by quantity. However, there are phenomena both on the phonological and the phonetic level which suggest that for low, mid, and high vowels a separate evaluation of the quantity opposition is necessary. In order to test this, we conducted a perception test, in which embedded and isolated vowels spoken by a native Hungarian speaker were to be identified by native listeners. The results show that the perception of vowel length and vowel quality (i.e. the formant structure) closely interacts in Hungarian. Low vowels, for which short and long realisations differ in quality, i.e. in vowel height, were seldom identified incorrectly. For embedded high vowels, duration was not obviously regarded as a crucial cue for identification by the subjects, nor were they clearly differentiated by the speaker. Mid vowels showed a mixed behaviour: they were differentiated regarding their duration and formant structure in production, however, this information was only partly used by the listeners. The fact that vowel quantity distinction in Hungarian is only maintained where there is a perceivable quality difference shows that the role of quantity is not as dominant as it has been regarded for long
    corecore